Overview
The selection of the basecaller or the algorithm impacts base quality
of sequences of Nanopore reads. Here we tested basecallers Guppy and
Bonito on fungal genomic data generated using R9.4.1 and SQK-LSK109.
Genomic DNA was extracted from cultured mycelia of a Pyricularia
oryzae isolate, which causes the blast disease in wheat.
To evaluate read error rates, basecalled reads were aligned to a
high-quality reference genome that was assembled and polished from data
generated from the same isolate. The accuracy of the reference genome is
estimated to be higher than 99.9%. Therefore, polymorphisms deteted
between reads and the reference genome represent sequence errors.
Three sets of reads were generated from the same Nanopore fast5 by
using bonito (version 0.5.3), guppy (version 3.4.4), and guppy (version
6.1.7). For bonito, the dna_r9.4.1_e8.1_sup@v3.3 was
used. For basecaller guppy, the model dna_r9.4.1_450bps_hac.cfg was used
when guppy3.4.4 was implemented; the model dna_r9.4.1_450bps_sup.cfg was
used when guppy6.1.7 was implemented.
Results
Comparisons between base quality of reads from Guppy 3.4.4 and 6.1.7
found that Guppy 6.1.7 resulted in much higher quality reads. The median
error rate has been reduced almost half from 3.4.4 to 6.1.7 (6.2% to
3.3%). The comparison might not be very fair since the “hac” model was
used for 3.4.4 and “sup” was used for 6.1.7.
Comparisons between base quality of reads from Bonito 0.5.3 and Guppy
6.1.7 found that Guppy 6.1.7 resulted in higher quality reads. The
median error rates are 4.8% and 3.3% for Bonito and Guppy 6.1.7,
respectively.
Conclusion
The model for super-accuracy reads in Guppy can produce
higher-quality data than the counterpart model using in Bonito. For
Bonito, the model can be trained using own data, which can be tested in
the future.
LS0tCnRpdGxlOiAiQmFzZSBxdWFsaXR5IG9mIE9OVCBuYW5vcG9yZSByZWFkcyBmcm9tIGRpZmZlcmVudCBiYXNlY2FsbGVycyIKYXV0aG9yOiAiU2FuemhlbiBMaXUiCmRhdGU6ICIyMDIyLTA3LTEwIgpvdXRwdXQ6CiAgaHRtbF9kb2N1bWVudDoKICAgIGRmX3ByaW50OiBwYWdlZAogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQKICBwZGZfZG9jdW1lbnQ6IGRlZmF1bHQKcHJvamVjdHM6IFtdCmRyYWZ0OiBubwpmZWF0dXJlZDogbm8KaW1hZ2U6CiAgY2FwdGlvbjogJycKICBmb2NhbF9wb2ludDogJycKICBwbGFjZW1lbnQ6IDIKICBwcmV2aWV3X29ubHk6IG5vCmF1dGhvcnM6IFNhbnpoZW4gTGl1CnRhZ3M6IE5hbm9wb3JlCmNhdGVnb3JpZXM6IGJpb2luZm9ybWF0aWNzCmxhc3Rtb2Q6ICIyMDIyLTA3LTEwIgplZGl0b3Jfb3B0aW9uczoKICBjaHVua19vdXRwdXRfdHlwZTogaW5saW5lCi0tLQoKIyMgT3ZlcnZpZXcKClRoZSBzZWxlY3Rpb24gb2YgdGhlIGJhc2VjYWxsZXIgb3IgdGhlIGFsZ29yaXRobSBpbXBhY3RzIGJhc2UgcXVhbGl0eSBvZiBzZXF1ZW5jZXMgb2YgTmFub3BvcmUgcmVhZHMuIEhlcmUgd2UgdGVzdGVkIGJhc2VjYWxsZXJzIEd1cHB5IGFuZCBCb25pdG8gb24gZnVuZ2FsIGdlbm9taWMgZGF0YSBnZW5lcmF0ZWQgdXNpbmcgUjkuNC4xIGFuZCBTUUstTFNLMTA5LiBHZW5vbWljIEROQSB3YXMgZXh0cmFjdGVkIGZyb20gY3VsdHVyZWQgbXljZWxpYSBvZiBhICpQeXJpY3VsYXJpYSBvcnl6YWUqIGlzb2xhdGUsIHdoaWNoIGNhdXNlcyB0aGUgYmxhc3QgZGlzZWFzZSBpbiB3aGVhdC4KClRvIGV2YWx1YXRlIHJlYWQgZXJyb3IgcmF0ZXMsIGJhc2VjYWxsZWQgcmVhZHMgd2VyZSBhbGlnbmVkIHRvIGEgaGlnaC1xdWFsaXR5IHJlZmVyZW5jZSBnZW5vbWUgdGhhdCB3YXMgYXNzZW1ibGVkIGFuZCBwb2xpc2hlZCBmcm9tIGRhdGEgZ2VuZXJhdGVkIGZyb20gdGhlIHNhbWUgaXNvbGF0ZS4gVGhlIGFjY3VyYWN5IG9mIHRoZSByZWZlcmVuY2UgZ2Vub21lIGlzIGVzdGltYXRlZCB0byBiZSBoaWdoZXIgdGhhbiA5OS45JS4gVGhlcmVmb3JlLCBwb2x5bW9ycGhpc21zIGRldGV0ZWQgYmV0d2VlbiByZWFkcyBhbmQgdGhlIHJlZmVyZW5jZSBnZW5vbWUgcmVwcmVzZW50IHNlcXVlbmNlIGVycm9ycy4KClRocmVlIHNldHMgb2YgcmVhZHMgd2VyZSBnZW5lcmF0ZWQgZnJvbSB0aGUgc2FtZSBOYW5vcG9yZSBmYXN0NSBieSB1c2luZyBib25pdG8gKHZlcnNpb24gMC41LjMpLCBndXBweSAodmVyc2lvbiAzLjQuNCksIGFuZCBndXBweSAodmVyc2lvbiA2LjEuNykuIEZvciBib25pdG8sIHRoZSBbZG5hX3I5LjQuMV9lOC4xX3N1cFxAdjMuM10obWFpbHRvOmRuYV9yOS40LjFfZTguMV9zdXBAdjMuMykgd2FzIHVzZWQuIEZvciBiYXNlY2FsbGVyIGd1cHB5LCB0aGUgbW9kZWwgZG5hX3I5LjQuMV80NTBicHNfaGFjLmNmZyB3YXMgdXNlZCB3aGVuIGd1cHB5My40LjQgd2FzIGltcGxlbWVudGVkOyB0aGUgbW9kZWwgZG5hX3I5LjQuMV80NTBicHNfc3VwLmNmZyB3YXMgdXNlZCB3aGVuIGd1cHB5Ni4xLjcgd2FzIGltcGxlbWVudGVkLgoKIyMgUmVzdWx0cwoKQ29tcGFyaXNvbnMgYmV0d2VlbiBiYXNlIHF1YWxpdHkgb2YgcmVhZHMgZnJvbSBHdXBweSAzLjQuNCBhbmQgNi4xLjcgZm91bmQgdGhhdCBHdXBweSA2LjEuNyByZXN1bHRlZCBpbiBtdWNoIGhpZ2hlciBxdWFsaXR5IHJlYWRzLiBUaGUgbWVkaWFuIGVycm9yIHJhdGUgaGFzIGJlZW4gcmVkdWNlZCBhbG1vc3QgaGFsZiBmcm9tIDMuNC40IHRvIDYuMS43ICg2LjIlIHRvIDMuMyUpLiBUaGUgY29tcGFyaXNvbiBtaWdodCBub3QgYmUgdmVyeSBmYWlyIHNpbmNlIHRoZSAiaGFjIiBtb2RlbCB3YXMgdXNlZCBmb3IgMy40LjQgYW5kICJzdXAiIHdhcyB1c2VkIGZvciA2LjEuNy4KCiFbKipGaWd1cmUgMSoqLiBHdXBweSAzLjQuNCB2cy4gNi4xLjddKEd1cHB5MzQ0dnM2MTdzdXAucXVhbGl0eS5wbmcpCgpDb21wYXJpc29ucyBiZXR3ZWVuIGJhc2UgcXVhbGl0eSBvZiByZWFkcyBmcm9tIEJvbml0byAwLjUuMyBhbmQgR3VwcHkgNi4xLjcgZm91bmQgdGhhdCBHdXBweSA2LjEuNyByZXN1bHRlZCBpbiBoaWdoZXIgcXVhbGl0eSByZWFkcy4gVGhlIG1lZGlhbiBlcnJvciByYXRlcyBhcmUgNC44JSBhbmQgMy4zJSBmb3IgQm9uaXRvIGFuZCBHdXBweSA2LjEuNywgcmVzcGVjdGl2ZWx5LgoKIVsqKkZpZ3VyZSAyKiouIEJvbml0byAwLjUuMyB2cy4gR3VwcHkgNi4xLjddKGJvbml0bzA1M3ZzR3VwcHk2MTcucXVhbGl0eS5wbmcpCgojIyBDb25jbHVzaW9uCgpUaGUgbW9kZWwgZm9yIHN1cGVyLWFjY3VyYWN5IHJlYWRzIGluIEd1cHB5IGNhbiBwcm9kdWNlIGhpZ2hlci1xdWFsaXR5IGRhdGEgdGhhbiB0aGUgY291bnRlcnBhcnQgbW9kZWwgdXNpbmcgaW4gQm9uaXRvLiBGb3IgQm9uaXRvLCB0aGUgbW9kZWwgY2FuIGJlIHRyYWluZWQgdXNpbmcgb3duIGRhdGEsIHdoaWNoIGNhbiBiZSB0ZXN0ZWQgaW4gdGhlIGZ1dHVyZS4K